38 research outputs found

    Entity finding in a document collection using adaptive window sizes

    Get PDF
    Traditional search engines work by returning a list of documents in response to queries. However, such engines are often inadequate when the information need of the user involves entities. This issue has led to the development of entity-search, which unlike normal web search does not aim at returning documents but names of people, products, organisations, etc. Some of the most successful methods for identifying relevant entities were built around the idea of a proximity search. In this thesis, we present an adaptive, well-founded, general-purpose entity finding model. In contrast to the work of other researchers, where the size of the targeted part of the document (i.e., the window size) is fixed across the collection, our method uses a number of document features to calculate an adaptive window size for each document in the collection. We construct a new entity finding test collection called the ESSEX test collection for use in evaluating our method. This collection represents a university setting as the data was collected from the publicly accessible webpages of the University of Essex. We test our method on five different datasets including the W3C Dataset, CERC Dataset, UvT/TU Datasets, ESSEX dataset and the ClueWeb09 entity finding collection. Our method provides a considerable improvement over various baseline models on all of these datasets. We also find that the document features considered for the calculation of the window size have differing impacts on the performance of the search. These impacts depend on the structure of the documents and the document language. As users may have a variety of search requirements, we show that our method is adaptable to different applications, environments, types of named entities and document collections

    Proactive detection of DDOS attacks in Publish-Subscribe networks

    Get PDF
    Information centric networking (ICN) using architectures such as Publish-Subscribe Internet Routing Paradigm (PSIRP) or Publish-Subscribe Internet Technology (PURSUIT) has been proposed as an important candidate for the Internet of the future. ICN is an emerging research area that proposes a transformation of the current host centric Internet architecture into an architecture where information items are of primary importance. This change allows network functions such as routing and locating to be optimized based on the information items themselves. The Bloom filter based content delivery is a source routing scheme that is used in the PSIRP/PURSUIT architectures. Although this mechanism solves many issues of today’s Internet such as the growth of the routing table and the scalability problems, it is vulnerable to distributed denial-of-service (DDoS) attacks. In this paper, we present a new content delivery scheme that has the advantages of Bloom filter based approach while at the same time being able to prevent DDoS attacks on the forwarding mechanism. Our security analysis suggests that with the proposed approach, the forwarding plane is able to resist attacks such as DDoS with very high probabilit

    Enhancing the Performance of SQL Injection Attack Detection through Probabilistic Neural Networks

    No full text
    SQL injection attack is considered one of the most dangerous vulnerabilities exploited to leak sensitive information, gain unauthorized access, and cause financial loss to individuals and organizations. Conventional defense approaches use static and heuristic methods to detect previously known SQL injection attacks. Existing research uses machine learning techniques that have the capability of detecting previously unknown and novel attack types. Taking advantage of deep learning to improve detection accuracy, we propose using a probabilistic neural network (PNN) to detect SQL injection attacks. To achieve the best value in selecting a smoothing parament, we employed the BAT algorithm, a metaheuristic algorithm for optimization. In this study, a dataset consisting of 6000 SQL injections and 3500 normal queries was used. Features were extracted based on tokenizing and a regular expression and were selected using Chi-Square testing. The features used in this study were collected from the network traffic and SQL queries. The experiment results show that our proposed PNN achieved an accuracy of 99.19% with a precision of 0.995%, a recall of 0.981%, and an F-Measure of 0.928% when employing a 10-fold cross-validation compared to other classifiers in different scenarios

    Deep Dive into Fake News Detection: Feature-Centric Classification with Ensemble and Deep Learning Methods

    No full text
    The online spread of fake news on various platforms has emerged as a significant concern, posing threats to public opinion, political stability, and the dissemination of reliable information. Researchers have turned to advanced technologies, including machine learning (ML) and deep learning (DL) techniques, to detect and classify fake news to address this issue. This research study explores fake news classification using diverse ML and DL approaches. We utilized a well-known “Fake News” dataset sourced from Kaggle, encompassing a labelled news collection. We implemented diverse ML models, including multinomial naïve bayes (MNB), gaussian naïve bayes (GNB), Bernoulli naïve Bayes (BNB), logistic regression (LR), and passive aggressive classifier (PAC). Additionally, we explored DL models, such as long short-term memory (LSTM), convolutional neural networks (CNN), and CNN-LSTM. We compared the performance of these models based on key evaluation metrics, such as accuracy, precision, recall, and the F1 score. Additionally, we conducted cross-validation and hyperparameter tuning to ensure optimal performance. The results provide valuable insights into the strengths and weaknesses of each model in classifying fake news. We observed that DL models, particularly LSTM and CNN-LSTM, showed better performance compared to traditional ML models. These models achieved higher accuracy and demonstrated robustness in classification tasks. These findings emphasize the potential of DL models to tackle the spread of fake news effectively and highlight the importance of utilizing advanced techniques to address this challenging problem

    Hybrid Clustering and Routing Algorithm with Threshold-Based Data Collection for Heterogeneous Wireless Sensor Networks

    No full text
    The concept of the internet of things (IoT) motivates us to connect bulk isolated heterogeneous devices to automate report generation without human interaction. Energy-efficient routing algorithms help to prolong the network lifetime of these energy-restricted smart devices that are connected by means of wireless sensor networks (WSNs). Current vendor-level advancements enable algorithm-level flexibility to design protocols to concurrently collect multiple application data while enforcing the reduction of energy expenditure to gain commercial success in the industrial stage. In this paper, we propose a hybrid clustering and routing algorithm with threshold-based data collection for heterogeneous wireless sensor networks. In our proposed model, homogeneous and heterogeneous nodes are deployed within specific regions. To reduce unnecessary data transmission, threshold-based conditions are presented to prevent unnecessary transmission when minor or no change is observed in the simulated and real-world applications. We further extend our proposed multi-hop model to achieve more network stability in dense and larger network areas. Our proposed model shows enhancement in terms of load balancing and end-to-end delay as compared to the other threshold-based energy-efficient routing protocols, such as the threshold-sensitive stable election protocol (TSEP), threshold distributed energy-efficient clustering (TDEEC), low-energy adaptive clustering hierarchy (LEACH), and energy-efficient sensor network (TEEN)

    CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning

    No full text
    Internet users are continually exposed to phishing as cybercrime in the 21st century. The objective of phishing is to obtain sensitive information by deceiving a target and using the information for financial gain. The information may include a login detail, password, date of birth, credit card number, bank account number, and family-related information. To acquire these details, users will be directed to fill out the information on false websites based on information from emails, adverts, text messages, or website pop-ups. Examining the website’s URL address is one method for avoiding this type of deception. Identifying the features of a phishing website URL takes specialized knowledge and investigation. Machine learning is one method that uses existing data to teach machines to distinguish between legal and phishing website URLs. In this work, we proposed a method that combines correlation and recursive feature elimination to determine which URL characteristics are useful for identifying phishing websites by gradually decreasing the number of features while maintaining accuracy value. In this paper, we use two datasets that contain 48 and 87 features. The first scenario combines power predictive score correlation and recursive feature elimination; the second scenario is the maximal information coefficient correlation and recursive feature elimination. The third scenario combines spearman correlation and recursive feature elimination. All three scenarios from the combined findings of the proposed methodologies achieve a high level of accuracy even with the smallest feature subset. For dataset 1, the accuracy value for the 10 features result is 97.06%, and for dataset 2 the accuracy value is 95.88% for 10 features

    Application of a Machine Learning Algorithm for Evaluation of Stiff Fractional Modeling of Polytropic Gas Spheres and Electric Circuits

    No full text
    Fractional polytropic gas sphere problems and electrical engineering models typically simulated with interconnected circuits have numerous applications in physical, astrophysical phenomena, and thermionic currents. Generally, most of these models are singular-nonlinear, symmetric, and include time delay, which has increased attention to them among researchers. In this work, we explored deep neural networks (DNNs) with an optimization algorithm to calculate the approximate solutions for nonlinear fractional differential equations (NFDEs). The target data-driven design of the DNN-LM algorithm was further implemented on the fractional models to study the rigorous impact and symmetry of different parameters on RL, RC circuits, and polytropic gas spheres. The targeted data generated from the analytical and numerical approaches in the literature for different cases were utilized by the deep neural networks to predict the numerical solutions by minimizing the differences in mean square error using the Levenberg–Marquardt algorithm. The numerical solutions obtained by the designed technique were contrasted with the multi-step reproducing kernel Hilbert space method (MS-RKM), Laplace transformation method (LTM), and Padé approximations. The results demonstrate the accuracy of the design technique as the DNN-LM algorithm overlaps with the actual results with minimum percentage absolute errors that lie between 10−8 and 10−12. The extensive graphical and statistical analysis of the designed technique showed that the DNN-LM algorithm is dependable and facilitates the examination of higher-order nonlinear complex problems due to the flexibility of the DNN architecture and the effectiveness of the optimization procedure

    Numerical Analysis of Electrohydrodynamic Flow in a Circular Cylindrical Conduit by Using Neuro Evolutionary Technique

    No full text
    This paper analyzes the mathematical model of electrohydrodynamic (EHD) fluid flow in a circular cylindrical conduit with an ion drag configuration. The phenomenon was modelled as a nonlinear differential equation. Furthermore, an application of artificial neural networks (ANNs) with a generalized normal distribution optimization algorithm (GNDO) and sequential quadratic programming (SQP) were utilized to suggest approximate solutions for the velocity, displacements, and acceleration profiles of the fluid by varying the Hartmann electric number (Ha2) and the strength of nonlinearity (α). ANNs were used to model the fitness function for the governing equation in terms of mean square error (MSE), which was further optimized initially by GNDO to exploit the global search. Then SQP was implemented to complement its local convergence. Numerical solutions obtained by the design scheme were compared with RK-4, the least square method (LSM), and the orthonormal Bernstein collocation method (OBCM). Stability, convergence, and robustness of the proposed algorithm were endorsed by the statistics and analysis on results of absolute errors, mean absolute deviation (MAD), Theil’s inequality coefficient (TIC), and error in Nash Sutcliffe efficiency (ENSE)
    corecore